AITopics | Arad

Collaborating Authors

Arad

Impact and influence of modern AI in metadata management

Yang, Wenli, Fu, Rui, Amin, Muhammad Bilal, Kang, Byeong

arXiv.org Artificial IntelligenceJan-27-2025

Metadata management plays a critical role in data governance, resource discovery, and decision-making in the data-driven era. While traditional metadata approaches have primarily focused on organization, classification, and resource reuse, the integration of modern artificial intelligence (AI) technologies has significantly transformed these processes. This paper investigates both traditional and AI-driven metadata approaches by examining open-source solutions, commercial tools, and research initiatives. A comparative analysis of traditional and AI-driven metadata management methods is provided, highlighting existing challenges and their impact on next-generation datasets. The paper also presents an innovative AI-assisted metadata management framework designed to address these challenges. This framework leverages more advanced modern AI technologies to automate metadata generation, enhance governance, and improve the accessibility and usability of modern datasets. Finally, the paper outlines future directions for research and development, proposing opportunities to further advance metadata management in the context of AI-driven innovation and complex datasets.

data mining, information retrieval, machine learning, (24 more...)

arXiv.org Artificial Intelligence

2501.16605

Country:

Oceania > Australia > Tasmania (0.04)
Europe > United Kingdom (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(7 more...)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Industry:

Law (1.00)
Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Information Management > Metadata Management (1.00)
Information Technology > Data Science > Data Quality (1.00)
(5 more...)

Add feedback

Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models

Chen, Daoyuan, Huang, Yilun, Pan, Xuchen, Jiang, Nana, Wang, Haibin, Ge, Ce, Chen, Yushuo, Zhang, Wenhao, Ma, Zhijian, Zhang, Yilei, Huang, Jun, Lin, Wei, Li, Yaliang, Ding, Bolin, Zhou, Jingren

arXiv.org Artificial IntelligenceDec-23-2024

The burgeoning field of foundation models necessitates advanced data processing mechanisms capable of harnessing vast valuable data with varied types utilized by these models. Nevertheless, the current landscape presents unique challenges that traditional data processing frameworks cannot handle effectively, especially with multimodal intricacies. In response, we present Data-Juicer 2.0, a new system offering fruitful data processing capabilities backed by over a hundred operators spanning various modalities like text, image, audio, and video. With seamless compatibility and dedicated optimization to popular dataset hubs like Hugging Face and computing engines like Ray, Data-Juicer 2.0 enhances its predecessor in both usability, efficiency, and programmability. It features an easily accessible user interface layer that supports decoupled Python interactions, RESTful APIs, and conversational commands. Alongside this, it contains a core runtime layer optimized for adaptive execution and management across different dataset scales, processing demands, and computational environments, while shielding unnecessary system details. Extensive empirical evaluations demonstrate Data-Juicer 2.0's remarkable performance and scalability, highlighting its capability to efficiently process tens of billions of data samples with tens of thousands of CPU cores. The system is publicly available, actively maintained, and broadly adopted in diverse research endeavors, practical applications, and real-world products such as Alibaba Cloud PAI.

data mining, large language model, machine learning, (25 more...)

arXiv.org Artificial Intelligence

2501.14755

Country:

Antarctica (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Southern Ocean (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Software (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(7 more...)

Add feedback

AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge

Wu, Xiaobao, Pan, Liangming, Xie, Yuxi, Zhou, Ruiwen, Zhao, Shuai, Ma, Yubo, Du, Mingzhe, Mao, Rui, Luu, Anh Tuan, Wang, William Yang

arXiv.org Artificial IntelligenceDec-18-2024

Data contamination hinders fair LLM evaluation by introducing test data into newer models' training sets. Existing studies solve this challenge by updating benchmarks with newly collected data. However, they fail to guarantee contamination-free evaluation as the newly collected data may contain pre-existing knowledge, and their benchmark updates rely on intensive human labor. To address these issues, we in this paper propose AntiLeak-Bench, an automated anti-leakage benchmarking framework. Instead of simply using newly collected data, we construct samples with explicitly new knowledge absent from LLMs' training sets, which thus ensures strictly contamination-free evaluation. We further design a fully automated workflow to build and update our benchmark without human labor. This significantly reduces the cost of benchmark maintenance to accommodate emerging LLMs. Through extensive experiments, we highlight that data contamination likely exists before LLMs' cutoff time and demonstrate AntiLeak-Bench effectively overcomes this challenge.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.1367

Country:

Europe > Finland (0.14)
North America > United States > Missouri > Jackson County > Kansas City (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
(10 more...)

Genre:

Research Report (0.64)
Workflow (0.49)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Sports > Football (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Adaptive multiple optimal learning factors for neural network training

Challagundla, Jeshwanth

arXiv.org Artificial IntelligenceJun-4-2024

The Univer sity of Texas at Arlington, 2015 Sup ervising Professor: Michael Manry There is always an ambiguity in deciding the number of learning factors that is really required for training a Multi - Layer Perceptron. This thesis solves this problem by introducing a new method of adaptively changing the number of learning factors computed based on the error change created per multiply. A new method is introduced for computing learning factors for weights grouped based on the curvature of the objective function. A method for linearly compressing large ill - conditioned Newton's Hessian matrices to smaller well - conditioned ones is shown. This thesis also shows that the proposed training algorithm adapts itself between two other algorithms in order to produce a better error decrease per multiply. The performanc e of the proposed algorithm is shown to be better than OWO - MOLF and Levenberg Marquardt for most of the data sets.

algorithm, iteration, neural network, (13 more...)

arXiv.org Artificial Intelligence

2406.06583

Country:

North America > United States > Texas (0.24)
North America > United States > California > San Diego County > San Diego (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry: Energy > Renewable (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.87)

Add feedback

Multiple Imputation for Biomedical Data using Monte Carlo Dropout Autoencoders

Miok, Kristian, Nguyen-Doan, Dong, Robnik-Šikonja, Marko, Zaharie, Daniela

arXiv.org Machine LearningMay-13-2020

Due to complex experimental settings, missing values are common in biomedical data. To handle this issue, many methods have been proposed, from ignoring incomplete instances to various data imputation approaches. With the recent rise of deep neural networks, the field of missing data imputation has oriented towards modelling of the data distribution. This paper presents an approach based on Monte Carlo dropout within (Variational) Autoencoders which offers not only very good adaptation to the distribution of the data but also allows generation of new data, adapted to each specific instance. The evaluation shows that the imputation error and predictive similarity can be improved with the proposed approach.

artificial intelligence, imputation, machine learning, (14 more...)

arXiv.org Machine Learning

2005.06173

Country:

Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.05)
Europe > Romania > Vest Development Region > Timiș County > Timișoara (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.61)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

8-Valent Fuzzy Logic for Iris Recognition and Biometry

Popescu-Bodorin, N., Balas, V. E., Motoc, I. M.

arXiv.org Artificial IntelligenceNov-8-2011

This paper shows that maintaining logical consistency of an iris recognition system is a matter of finding a suitable partitioning of the input space in enrollable and unenrollable pairs by negotiating the user comfort and the safety of the biometric system. In other words, consistent enrollment is mandatory in order to preserve system consistency. A fuzzy 3-valued disambiguated model of iris recognition is proposed and analyzed in terms of completeness, consistency, user comfort and biometric safety. It is also shown here that the fuzzy 3-valued model of iris recognition is hosted by an 8-valued Boolean algebra of modulo 8 integers that represents the computational formalization in which a biometric system (a software agent) can achieve the artificial understanding of iris recognition in a logically consistent manner.

machine learning, pattern recognition, recognition, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ISCIII.2011.6069761

1111.2763

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Somerset > Bath (0.04)
Europe > Romania > Vest Development Region > Arad County > Arad (0.04)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)

Add feedback

From Cognitive Binary Logic to Cognitive Intelligent Agents

Popescu-Bodorin, Nicolaie, Balas, Valentina E.

arXiv.org Artificial IntelligenceJun-18-2011

The relation between self awareness and intelligence is an open problem these days. Despite the fact that self awarness is usually related to Emotional Intelligence, this is not the case here. The problem described in this paper is how to model an agent which knows (Cognitive) Binary Logic and which is also able to pass (without any mistake) a certain family of Turing Tests designed to verify its knowledge and its discourse about the modal states of truth corresponding to well-formed formulae within the language of Propositional Binary Logic.

artificial intelligence, deductive discourse, full deductive discourse, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/INES.2010.5483820

1106.5995

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Romania > Vest Development Region > Arad County > Arad (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.91)

Add feedback